-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial re-revert of #104336. Only JIT fixes are included. #105596
Conversation
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
Figured the reasons for rare When we emit GS cookie check, we do not have correct GC info in codegen. At this point we often do not track return registers or call arguments (if it is a tail call). Here is an example of a repro. The key parts here is that the method is fully interruptible (due to JIT stress?) and has a GS cookie check. The part that it also has a tailcall after the check makes things more likely to fail, but I do not believe a tailcall is strictly required to repro. ; Total bytes of code 92, prolog size 16, PerfScore 21.00, instruction count 23, allocated bytes for code 92 (MethodHash=ec4cfb07) for method System.Globalization.CompareInfo+SortHandleCache:.cctor() (MinOpts)
; ============================================================
; Assembly listing for method System.Collections.Generic.Dictionary`2[System.__Canon,long]:.ctor():this (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 3, 3 ) ref -> x0 this class-hnd single-def <System.Collections.Generic.Dictionary`2[System.__Canon,long]>
; V01 tmp0 [V01,T01] ( 1, 1 ) int -> [fp+0x14] do-not-enreg[V] "GSCookie dummy"
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V03 GsCookie [V03 ] ( 1, 1 ) long -> [fp+0x18] do-not-enreg[X] addr-exposed "GSSecurityCookie"
;
; Lcl frame size = 16
G_M46826_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
stp fp, lr, [sp, #-0x20]!
mov fp, sp
movz x1, #0x5678
movk x1, #0x1234 LSL #16
movk x1, #0xDEF0 LSL #32
movk x1, #0x9ABC LSL #48
str x1, [fp, #0x18] // [V03 GsCookie]
;; size=28 bbWeight=1 PerfScore 4.50
G_M46826_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref
; gcrRegs +[x0]
mov w1, wzr
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG03: ; bbWeight=1, extend
mov x2, xzr
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG04: ; bbWeight=1, extend
movz x3, #0xC4E0 // code for System.Collections.Generic.Dictionary`2[System.__Canon,long]:.ctor(int,System.Collections.Generic.IEqualityComparer`1[System.__Canon]):this
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG05: ; bbWeight=1, extend
movk x3, #0x3371 LSL #16
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG06: ; bbWeight=1, extend
movk x3, #0x7FF9 LSL #32
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG07: ; bbWeight=1, extend
ldr x3, [x3]
;; size=4 bbWeight=1 PerfScore 3.00
G_M46826_IG08: ; bbWeight=1, extend
movz xip0, #0x5678
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG09: ; bbWeight=1, extend
movk xip0, #0x1234 LSL #16
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG10: ; bbWeight=1, extend
movk xip0, #0xDEF0 LSL #32
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG11: ; bbWeight=1, extend
movk xip0, #0x9ABC LSL #48
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG12: ; bbWeight=1, extend
ldr xip1, [fp, #0x18] // [V03 GsCookie]
;; size=4 bbWeight=1 PerfScore 2.00
G_M46826_IG13: ; bbWeight=1, extend
cmp xip0, xip1
;; size=4 bbWeight=1 PerfScore 0.50
G_M46826_IG14: ; bbWeight=1, isz, extend
beq G_M46826_IG16
;; size=4 bbWeight=1 PerfScore 1.00
G_M46826_IG15: ; bbWeight=1, extend
bl CORINFO_HELP_FAIL_FAST
; gcrRegs -[x0] <==== x0 SHOULD STAY ALIVE HERE !!!
;; size=4 bbWeight=1 PerfScore 1.00
G_M46826_IG16: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
ldp fp, lr, [sp], #0x20
br x3
;; size=8 bbWeight=1 PerfScore 2.00 There are pieces of code that try to force the return register to be live, but seems to be missing cases when there are two returns. Basically not having the live arg inference causes issues like above. Having it causes other issues (like crashes on x64 when reporting GCs that no longer contain GC refs). Either inference should work correctly or this whole GS cookie deal should be not interruptible. That is just a few instructions before epilog anyways. |
What was saving us, was the treatment of the first instruction of epilog as uninterruptible, which is wrong thing to do for regular methods that can do GC or can return. (unlike As simple fix would be to do Alternatively, I could tell the emit to keep tracking its info when we call calling |
I think x86 could continue doing what is what doing. |
/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
I think this is ready for review. @dotnet/jit-contrib |
@@ -594,8 +594,7 @@ bool emitter::emitGenNoGCLst(Callback& cb) | |||
emitter::instrDesc* id = emitFirstInstrDesc(ig->igData); | |||
assert(id != nullptr); | |||
assert(id->idCodeSize() > 0); | |||
if (!cb(ig->igFuncIdx, ig->igOffs, ig->igSize, id->idCodeSize(), | |||
ig->igFlags & (IGF_FUNCLET_PROLOG | IGF_FUNCLET_EPILOG | IGF_EPILOG))) | |||
if (!cb(ig->igFuncIdx, ig->igOffs, ig->igSize, id->idCodeSize(), ig->igFlags & (IGF_FUNCLET_PROLOG))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main part of the change. The epilog is no different from other No-GC regions and its first instruction should be interruptible.
See: https://github.com/dotnet/runtime/pull/104336/files#r1664973906
The rest of the changes is dealing with GS cookie check that is not always tracking GC info correctly and tweaking an assert in NativeAOT.
I think the GC information should be correct coming out of normal codegen (your change to allow GC on the epilog relies on that already). Did you understand what ends up breaking the GC information between finishing codegen for the
The code can be written in a platform agnostic way using |
I did not find a good documentation on what The GC info tracked by emitter is correct at this point since it sees assignments of returns and arguments and normally that is sufficient. Since emit tracks correctly everything works unless GS cookie is involved. Generating the check ends up flowing GC info from codegen to emit, since there is a method call and branch/label.
There is code that tries to force codegen to treat return registers and args as live. It refers to a Dev10 bug and that bug hints that just making GS cookie check uninterruptible could have been simpler solution. (at least how I read the bug, it does not tell a lot of story) What I see is that code also used on RISC platforms, but does not handle multiple return registers where more than one may be returned. It can be fixed using The same deal applies to the arguments. In this case I am not sure it can be easily fixed/forced/patched. The same approach as in x86 that just simply forces everything that is classified as a register argument to come alive, seems to be forcing alive bunch of stuff that is not even containing GC references at that point. Maybe there is a bug in that code or maybe it should use the signature+ABI of the calee instead. We are talking about few instructions (GS check) that are not a part of normal method body (i.e. user-written code), but more like satisfying platform requirements about exiting method. BTW. Another "fix" could be ensuring that GS check does not flow the codegen info (that is not corect) down to emit (that still tracs correct info). It is possible to use a different kind of label (local label?) that would not flow, but forcing the call not to flow feels like would be too invasive for how things are done currently (i.e. codegen knows and passes correct info at call sites, all calls kill certain registers, etc.. ). In the end it feels like while more complex solutions may exist, treating GS check as start of the epilog (for the purpose of GC tracking) is simpler with very little downside. |
Another consideration is that even if we make codegen to track returns/tail-args as live coming from Perhaps we should take a simpler solution now and follow up with more complex solutions, if we even decide we want that, later. |
I suspect the change codegen would need is to make this code run always: runtime/src/coreclr/jit/codegencommon.cpp Lines 7242 to 7256 in 6931b3b
This only runs for compIsProfilerHookNeeded right now (and we seem to clear the registers again after calling the profiler hook). Not sure why it tries so hard to keep the GC information incorrect.
I do not think tailcalls need any special handling.
IMO yes -- codegen and emit should generally agree on GC information at the boundaries. But I can be convinced to make this a .NET 10 item. |
My thought exactly. It does not look like the right place to be when codegen loses track of GC info even in this edge case. It breaks the model that "upper layers track at fewer edges, but more precisely". Whether we want the "right" fix in 9.0 - not sure. |
Thanks!! |
Re: #102370 (comment)
The epilog is no different from other No-GC regions and its first instruction should be interruptible.
See: https://github.com/dotnet/runtime/pull/104336/files#r1664973906
The rest of the changes is dealing with GS cookie check that is not always tracking GC info correctly.
It is possible that this part was already problematic even before the change (i.e. it was missing multi-register returns on ARM64, tail calls might not have been tracking all live args correctly) - hard to tell if that was causing actual failures though. Even with the epilog interruptibility change, the failures were relatively rare GC+JIT stress scenarios.